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Mh Abstract. Consider the classical problem of predicting the next bit in 

"^^ a sequence of bits. A standard performance measure is regret (loss in 

payoff) with respect to a set of experts. For example if we measure per- 
formance with respect to two constant experts one that always predicts 
O's and another that always predicts I's it is well known that one can get 
I I regret 0{yT) with respect to the best expert by using, say, the weighted 

rn majority algorithm [T]. But this algorithm does not provide performance 

, 'I guarantee in any interval. There are other algorithms (see [21314) ') that 

ensure regret 0[\/x\ogT) in any interval of length x. In this paper we 
show a randomized algorithm that in an amortized sense gets a regret 
of 0{'^) for any interval when the sequence is partitioned into inter- 
vals arbitrarily. We empirically estimated the constant in the 0() for T 
upto 2000 and found it to be small - around 2.1. We also experimentally 
evaluate the efficacy of this algorithm in predicting high frequency stock 
data. 
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.• 1 Introduction 

o 

(^ Consider the following classical game of predicting a binary ±1 sequence. An 

y~~^ algorithm A sees a binary sequence {bt}t>i, one bit at a time, and attempts to 

^ predict the next bit bt from the past history hi,., .bt-i- The payoff At of the 

algorithm in T steps is the number of correct guesses minus the number of the 
wrong guesses. In other words, let bt € [—1,1] be the prediction for the t*'' bit 
rrt based on the previous bits then: 

At := Yl ^*^*- 

l<t<T 

The payoff per time step btbt is essentially equivalent to the well known abso- 
lute loss function \bt — bt\ (see for example [5 , chapter 8)r| 

^ since when |6t| = 1, \bt — bt\ = \bt\\bt — fet| = |1 — btbt\ — 1 — btbt- Thus the absolute 
loss function is the negative of our payoff in one step plus a shift of 1. Also bt values 
from { — 1, 1} or {0, 1} are equivalent by a simple scaling and shifting transform. 
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One can view this game as an idealized "stock prediction" problem as follows. 
In each unit time, the stock price goes up or down by precisely 1%, and the 
algorithm bets on this event. If the bet is right, the player wins one dollar, 
and otherwise loses one dollar. Not surprisingly, in general, it is impossible to 
guarantee a positive payoff for all possible scenarios (sequences). However, one 
could hope to give some guarantees on the payoff of the algorithm based on 
certain properties of the sequence. 

For example one can compare the payoff to the better of two choices (experts), 
which correspond to two constant algorithms: first one, where bt = +1 and the 
second one where bt = —1 for all t. Note that the best of these experts gets payoff 
ISi<t<T^*l' which corresponds to the "optimal in hindsight" expert among 
the two choices. The regret of an algorithm is defined as how much worse the 
algorithm performs as opposed to the best of the two experts (in hindsight, after 
seeing the sequence). This has been studied in a number of papers, including 
|6I1I7I8I9] . A classical result says that one can obtain a regret of 0{VT) for a 
sequence of length T, via, say, the weighted majority algorithm [T]. Formally, 
for a sequence X — 6i,...,6t, let h{X) — J2i<t<T^t denote the "height" of 
the sequence when plotted cumulatively as a chart. Then we have the following 
theorem: 



Theorem 1. f^lUI^ There is an algorithm that achieves payoff > \h{X)\ — a\/T. 
It is also known that the optimal value of a —> ■\/2/7r as T — > oo. 

However, an algorithm that only focuses on the overall regret does not exploit 
short term trends in the sequence and only relies on a 'global' long term bias in 
the full string. Consider for example a sequence that may not have a high overall 
bias but has many intervals in which there may be a high level of bias. Our result 
is that for any partitioning of the sequence into intervals, one can essentially get 
a regret proportional to y^ for each interval of length x in an amortized sense 
(Theorem Isl) . Although our results are stated for bits they work even when bt 
is a real number in [—1, 1]. We note that even though similar bounds have been 
obtained before ( j2|3|4| and, more recently, |11I12) '). the penalty on an interval 
of length X is 0{y^xTogT) in these previous results. 

The bit prediction problem we consider is closely related to the two experts 
problem (or multi-armed bandits problem with full information). In each round 
each expert has a payoff in the range [0, 1] that is unknown to the algorithm. 
For two experts, let 614,62* denote the payoffs of the two experts at time t. 
The algorithm pulls each arm (expert) with probability 614,624 G [0,1] re- 
spectively where 614 4-624 = 1. The payoff of the algorithm in this setting is 

^T '■= J2t=i bitht + 624624- 

We will be concerned with the following payoff function in this paper: 

Definition 2 (Interval payoff function: Pa) 

Let Xi , . . . , Xk denote a partition of the sequence X into a disjoint union of 
k intervals, that is, X is the concatenation of these k subsequences. We will use 
h{Xi) to denote the sum of the bits in the interval Xi and \Xi\ to denote the 
length of Xi. 



The interval payoff Junction, Pa{X) is defined as the maximum value of the 
expression 




over all 1 < k < \X\ and all partitions Xi, . . . , Xk of X . 

We say that a payoff function / : {— f , 1}"^ — j' E is feasible if there is a bit 
prediction algorithm which on sequence X achieves payoff at least f{X). 

Theorem 3. (Main Theorem) There is an absolute constant a < 10 such 
that the payoff function Pa is feasible. 

For the two experts problem our result tranlates to the following guarantee: 




Here X^tex ^j* ^^ ^^® payoff of the j^^ expert in the interval Xi. 

This can be viewed as incurring a penalty of ai/jXil for each interval Xi. We 
theoretically show that the optimal value of a is at most 10 (Section [2]). We 
empirically estimated the optimal a for T up to 2000 and found it to be small - 
around 2.1 (Section [A.l ). 



We stress here that the algorithm doesn't need to know the partition or the 
length of the partition in advance. We also note that our guarantee does not hold 
for each interval individually but when we look at the net payoff in an amortized 
sense, we may account for a regret of at most ai/IXj for an interval of length 
X. In fact, the guarantee is impossible to achieve in a non-amortized sense. We 
show that if we measure regret based on the performance of an algorithm in a 
given interval then one will have to trade-off regrets at different time scales. 



Observation 4 (Observation 16) There is no prediction algorithm that can 



guarantee a regret of 0{'\/\Y\) on all intervals Y for all input sequences. 

Regarding the computation of Pa , we show: 

Theorem 5. (Theorem \l4\ ) The value of Pa{S) for a particular sequence S of 
length T can be computed using dynamic programming in time 0{T^). 

For a given T, let afjiT) denote the minimum a such that Pa is feasible for 
all sequences of length T. It is possible to determine ckq using the following well 
known observation by Cover. 

Observation 6 (Cover [6]) A payoff function f : { — 1, 1}"^ -^ M. is feasible if 
and only if Es[f{S)] < where S is a uniformly random sequence in {—1, 1}"^- 

This is achieved by a prediction algorithm that predicts bt — " — 2" — — 

where s is the sequence of bits seen so far, U is a suffix sequence chosen uni- 
formly at random and s.b.U denotes the concatenated sequence starting with s 
followed by bit b followed by the sequence U. Note that bt G [—1, 1] as long as for 
all s, \Eu[f{s.l.U)] - Eu[fis.{-l).U)]\ < 2 



Algorithm and Running time: Theorem [5] and Observation [6] suggest a 
simple algoritlim for achieving payoff function P^. Take the sequence s seen 
so far, append a +1 and then a random sequence to make it into a complete 
sequence of length T. Compute Pa{S) for the resulting sequence S. Do this again 
replacing the +1 by a —1. Predict bt to be the half of the difference in the two 
cases. 

We note that a deterministic algorithm achieving the guarantee of Theorem |3] 
may take exponential time since it would need to find Pq(S') for every random 
completion of the bits seen so far. Alternatively, there is a simple randomized 
algorithm which achieves the same payoff in expectation by taking a different 
random completion for every prefix. A naive implementation of this randomized 
algorithm will take T^ time for each bit being predicted. We show a simple 
variant that reduces this to O(logT) time with pre-computation. 



Theorem 7. (Theorem 15) There is a randomized algorithm that achieves the 
payoff guarantee Pa of Theorem [^ in expectation and spends OiT"^) time per 
step. There is also a randomized algorithm that achieves payoff Pa' with a' = ca 
and spends only O(logT) time per step. Here c :— "^ 

Both algorithms above use pre- computed information that takes 0{T^) space 
and is computed in 0{T'^) time. 

GeneraUzation to real numbers: We show that a variant of the guarantee 
holds in a semi-adversarial model where a string of real numbers may be cho- 
sen instead of bits. The model combines worst case and average case settings 
where the signs of the real numbers may be chosen adversarially (that is, in the 
worst case) but the magnitudes of the real numbers come from a pre-specified 



distribution independently and randomly (Theorem 17) . 

Experimental results: We implement our algorithm, the weighted major- 
ity algorithm, an algorithm based on Autoregressive Integrated Moving Average 
(ARIMA) and an algorithm of |12j . and compare their performance when pre- 
dicting financial time series data. Specifically, we consider the high frequency 
price data of 5 stocks, and we apply these algorithms to predict the per minute 
price changes in an online fashion taking the values in each day as a separate 
sequence. That is we predict the next minute returns of mid-prices for each stock 
based on its previous 1 minute returns in the day. We perform this experiment 
over 189 trading days for each stock and find that on an average our algorithm 
performs better than other prediction algorithms based on regret minimization 
but is outperformed by the ARIMA algorithm. On the other hand, as we dis- 
cussed above, our algorithm has certain provable guarantees for every sequence 
which the ARIMA algorithm lacks. The experimental setup and results are de- 
scribed in more detail in Section lAl 

1.1 Related work 

There is large body on work on regret style analysis for prediction. Numerous 
works including |6|10j have examined the optimal amount of regret achievable 



with respect to two or more experts. A good reference for the results in this 
area is [S]. It is well known that in the case of static experts, the optimal regret 
achievable is exactly equal to the Rademacher complexity of the predictions of 
the experts (chapter 8 in ^). Recent works such as [13114115] have extended 
this analysis to other settings. Measures other than the standard regret measure 
have been studied in '16] The question of what can be achieved if one would 
like to have a significantly better guarantee with respect to a fixed expert or a 
distribution of experts was asked before in |17I12| . Tradeoffs between regret and 
minimum payoff were also examined in }18[ . where the author studied the set 
of values of a, b for which an algorithm can have payoff aOPT + felogiV, where 
OPT is the payoff of the best arm and a, h are constants. 

Regret minimization algorithms with performance guarantees within each in- 
terval have been studied in |2I3I4J and more recently in |11I12| . As we mentioned, 
some of these algorithms achieve a regret of 0{\/x\ogT) for every interval of 
size X in a sequence of length T. A related work which also seeks to exploit short 
term trends in the sequence is |19j , where the regret bound proportional to vTk 
in the best case where k is the number of intervals (see [5], Corollary 5.1). The 
main difference between the work of |19' and our results is that their algorithm 
requires fixing the number of intervals, fc, in advance whereas our algorithm 
works simultaneously for all k. Also note that their regret guarantee is always 
higher than the payoff function Pq. for a sequence of length T achieving equality 
only in the special case when all intervals are of equal length T/fc. 

Numerous papers (for example |20)21|22] ) have implemented algorithms in- 
spired from regret style analysis and applied it on financial and other types of 
data. 

1.2 Overview of the proof 

In this section we give a high level idea of our proof, the formal proof appears 
in Section [2l 

To prove the main theorem we want to compute the minimum a such that 
Es[Pa{S)] < (See Observation [6]). We first introduce a variant of the pay- 
off function Pq(S') as follows. Instead of computing the maximum value of 
^j |/i(Xi)| — q;-\/|a7| over all possible partitions, will only allow partitions where 
the intervals are of the form (2*j,2*(j -|- 1)]; that is, intervals that are obtained 
by dividing the string into segments of length that are some power of 2. We will 
refer to such intervals as 'aligned' intervals (Definition ni]) . Further we will only 
look at T values that is some power of 2. Note that any interval can be broken 
into at most logT aligned intervals. Let P^{S) denote the maximum value of 
^j |ft.(Xi)| — a-y/lATil with partitions into aligned intervals. We first show that 



Lemma 1. (Lemma 13) If E[P^{S)] < then E[Pca{S)] < where c := -^. 
Next we show 



Theorem 8. (Theorem, 13) There is an absolute constant a < 2.8 such that 
E[P^iS)] < 0. 



We prove Theorem [8] recursively for T that are increasing powers of 2. We 
inductively show that the distribution of Pa{S) is stochastically upper bounded 



by a shifted exponential distribution (Definition 12 ) with certain parameters 



(Equation 2.1 ), where S* is a uniformly random sequence of length T. Since we are 
dealing with splits into aligned intervals, we can assume that either the best split 
for S is the whole interval, or the mid-point of S is one of the splitting points. For 
the first case, we may upper bound the payoff function using Hoeffding's bound 



(Theorem 10 1, while for the second case we may inductively assume that the 
distribution of payoffs for the subsequences is stochastically bounded by a shifted 
exponential distribution. We then separately bound each of this distributions by 
the shifted exponential distribution. 

2 Proof of Main theorem 

2.1 Preliminaries 

Definition 9 (Binomial distribution B^) Let Xi,X2, ■ ■ ■ ,Xn G {^lil} &6 w™" 
formly and independently distributed. Then the sum 

n 

i=l 

is said to be binomially distributed. We denote the distribution as B^. 
Theorem 10. (Hoeffding's bound) 1231 

Pr[|B„|>yV^]<2-expU|- 

Definition 11 (Aligned interval) 

We assume here that T is a power of 2. An aligned interval is one which is 
obtained by breaking [1,T] into 2* equal parts for i G [O,logr] and picking one 
of the parts. So for instance the first part is always [1, 2']. 

In other words, an interval [p + l,p + x] given by p Cz [0,T], x (z [1,T — p] as 
discussed above is said to be an aligned interval if p = j -2^ and x = 2' for some 
i e [O,logr] andj G [0,T-2']. 

We denote the interval payoff function corresponding to Definition [2] which 
allows only aligned splits as P^. 

Definition 12 (Shifted Exponential distribution) The probability density func- 
tion f^^a.n of shifted exponential distribution with mean a^/n and shift fiy^n is 
defined as follows: 

f I \ ^ ( y-l^Vn\ y, ^ r- 

Jtj.,a.n[y) ■■= — T= cxp ^— My > fiy'n 

f^.a,n{y) := Vy < [l^ 

We denote a random variable distributed according to f^j,,a,n as F^^^^^n- That is, 
'Pi^[Ffi,a.n >?/] = / fn{s) ds = cxp I — ^ ^y" ) when y > jiy/n and 1 otherwise. 



2.2 Proof 

Theorem 13. There is an absolute constant a < 2.8 s.t. there is an algorithm 
which achieves payoff greater than P^ for all T > 1 . 

Proof: We need to show that for all T > 1, E^g{_i^i}T[P^(a;)] < 0. After 
that, the theorem follows from Observation [6] (it is easy to check that the second 
condition of Observation ^ is satisfied for P^)- 

We will prove the theorem by induction. We will show that when n is a power 
of 2, 

Vy e M Pr [P^{x) > y] < Pr[P,,,,,„ > y] (2.1) 
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for some /i :— fJ-{a) and a := a{a). Here F^^^^^n is as in Definition 

Note that this would imply E^g{_i_i},.[P^(a;)] < E[F^_o-,n] = (m + cr)^/n. We 
will show that for a suitable choice of a, the term fi + a < 0, and this suffices to 
prove the theorem. 

It remains to prove Equation |2.1[ For the base case, n = 1, we see that the 
equation is satisfied for /x > 1 — a, cr > 0. We will now show that it is satisfied 
for 2n whenever it is satisfied for n (for appropriate /i and a). 

Now, for asequence x := (ii, 0:2) € {—1, l}"x{— 1, 1}", P^(x) = max(P^(xi)H 
Pa{x2), \h{x)\ — a ■ y/2n). So for every x such that P^ix) > y we must have 
either P^(xi) + Pa{x2) > y or that h{x) - a ■ y/2n > y. Thus, 

Pr [P^ix)>y] (2.2) 

< Pr \P^(x,)+P^{x2)>y]+ Pr [h{x)-a-V2^>y] (2.3) 

< Pr[F^,.,„ + P;,,,„ >y]+ Pr Hx) -a-V2n>y] (2.4) 

Here F and F' are independent random variables distributed as in Definition [T2J 
We will show that the first and second term are each bounded by ^ Pr[F2n > y] 
which is sufficient to prove Equation |2.1| Note that we only need to consider 
y > /i-\/2n since for smaller values of y we have 

Pi- ^ [P^{x)>y]<Pr[F2n>y] = l 

Henceforth, we will use shorthands /„ := fi_i,a,n and Fn '■= -F)i.CT,n- 
The first term can be written as:- 



/•CJO /"OO 

Pr[P„ + P; > y] = / / /„ (s) -f^iw- s) ds 

J v J —00 



00 /'OO 

sdw 

y 

00 pw — ^^/n 

j fnis)-fn{'W-s)dsdw 



where the second equation foUows from the fact that /„ (s) = for s < ^^/n and 
fn{w — s) = for s > w — yi.\/n. Thus, we need to show for aU y > ij,\/2n:- 



1*00 pw — ^^/n -1 

/ fn{s) ■ fjw -s)dsdw<- Pr[F2„ > y] 
- ^5— / / cxp = — I • cxp I ^ ) ds dw 



(T\/n 



1 I y ^ [lyin \ 

- 2'"'''^^ aV2n ) 

= ^5— / / cxp j= — 1 dsdw 

1 / y- nV2^\ 

= ^r— / [w — 2ii^/n) cxp = — I dw 

en Jy \ a^n 



0\/n 



1 i y- MV 2n 



In the third Une we imphcitly assume that y > 2/i-\/n, since otherwise the left 
hand side is less than and the equation is satisfied. 



Note that the integral is of the form J u- e ^"^ which integrates to — ( J'^ 
g-c« Thus, integrating and substituting z :— y — 2pu^n we need to show for all 
z>0, 



(Ty/n 


2z 


(J\/n 


2z 



(z + a\fn) ■ cxp 



+ 2 



o^ 



+ 2 



1 / z + (y2-l)/zv^ 



<2exp 



< 



cxp 



aJn 



z + {V2- l)^^/2n' 



({V2-l)z^ 

< exp == — 

~ I aV2n 



crV2r 
■expf{V2-l)^ 



Substituting w := ■;^^, we need for all to > 0, 

2^ + 2 <exp(^ (^J_^^" yexp((72^1)^ 



A V2 



exp, ^.^-1)"' ^ V (T 



The left hand side is maximized at w = l/\/2 and the value of left hand side at 
that point is around 2.78. Thus, if (— /i/ct) > 2.47 then the equation is always 
satisfied. 



We now turn to bounding the second term in Equation 2.4 We need to show 
for all y > ix\j2n, 

Pr [|x| -a-yp2^>y\<- Pr[F2„ > y\ 

^=Pr[|S2„| > y + a • 72^^] < i Pr[F2„ > y\ 
^=Pr[|S2„| > (2 + a) • V2^] < ^ Pr[F2„ > z ■ VM 
^=2 • exp (-^^±^\ < 1 Pr[F2„ > z ■ V2n] 



where the last line follows from Theorem [TOJ and in the second last line we 
substitute z := y/y/2n. 

Thus, we need to show for all z > ^,, 



4 • exp - ^ — —-^ < exp - 



a\'2n 



Substituting w := z — fj,, we need to show for all w > 0, 



exp (- ^ ^, + -j < 0.25 

jw + n + a)^ ^ w <_14 

2 (T ~ 



The left hand side is maximized at w + fi + a — \/a and for that value of w 
the inequality is given by 



— 1 1/a — n — a ^ . ^ , 0.5 

—^ + < -1.4 <^= /i + a > 1.4cr + — 

Also, recall that to bound the first term we needed — ^ > 2.47. Let's set 
/x := — 2.47(7. Then we need 

a > (1.4 + 2.47)<T + — ^ 3.87a + — 
The right hand side is minimized dX a — , \ k, 0.36, and substituting we 

V 2'3.87 

get that a ~ 2.8 is feasible. Recall that we also needed // + a > 1 from the base 
case which is already satisfied for this choice of parameters. 



3 Algorithm and running time 

Theorem 14. The value of Pa{S) for a sequence S of length T can be computed 
by a dynamic program (DP) in time 0{T^). 

Proof. We give a simple 0{T'^) space and 0{T^) time algorithm. 

For every subinterval (j,j) of the sequence, i,j G \T] the DP table stores 
Pa{Sij) where Sij is the subsequence of S containing bits from position i to 
position j, inclusive. For i = j, this value is always 1 — a. For j > i, to compute 
the value of Pa{Sij), we need to take the maximum over two quantities. The first 
quantity is \h{Sij)\ — a- ^/J^^T^^ which corresponds to splitting the subsequence 
into a single interval. This can be readily computed in constant time if we pre- 
compute the height of every subsequence, which can be done in O(r^) space 
and time. The second quantity is the maximum over all fc e {i,i + 1, . . . ,j} of 
PaiSik) + Pa{Skj)- This corresponds to splitting the subsequence at k and then 
recursively computing the best payoff in each of the two intervals created. This 
quantity can be computed in time j — i + 1 since for each k we just need to read 
off the appropriate values {Pa{Sik) and Pa{Skj)) from the DP table. 

Theorem 15. There is a randomized algorithm that achieves the payoff guar- 
antee Pa of the main theorem in expectation and spends 0{T^) time per step. 
There is also a randomized algorithm that achieves payoff Pa' with a' = ca and 
spends only 0{logT) time per step. Here c :— "i^ , ■ 

Both algorithms above use pre- computed information that takes 0{T^) space 
and is computed in 0{T^) time. 

Proof. Let X € {—1, 1}-^ be the input sequence we are required to predict. Using 
Observation[6J it is easy to see that the following algorithm achieves payoff Pa {X) 
in expectation. For every i € {0, 1, . . . , T — 1}: 

1. Let s e { — 1, 1}* be the sequence of bits seen so far. 
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2. Let Ut be a sequence drawn uniformly at random from { — 1, 1}"^^*^^ (inde- 
pendently for each t). Let si := s ■ 1 ■ U and s_i := s ■ (—1) • U. 

3. Make the prediction b := (-Pa(si) — Pa{s-i)/2 for the next bit. 

The key idea is that we will draw the random sequences Ut in advance and 
pre-compute enough information to make the prediction as fast as possible. For 
each t G {0, 1, . . . , T— 1} we pre-compute the following information for each Ut'-- 

1. h{Ul) for every prefix U^ of Ut 

2. PaiU^) for every sufRx U"^ of Ut 

The pre-computation takes 0{T^) time for each t and hence 0{T'^) time over- 
all. 

Let's describe how to use this pre-computed information to compute Pq(si) at 
time t (the computation of Pq(s_i) is similar). Let 1 < i <t and t + 2<j<T. 
Then it is easy to check that 



Pa = max 



Pa{sii) + PaiU-jr) + \h{s(i+i)t)\ + |^(f^(t+i)(i-i))| - " • Vi - i - 1 



Here for a sequence S, Sij is the subsequence of S containing bits from position 
i to position j, inclusive. Note that we think of Ut as being indexed from t + 1 
to T where the {t + 1)*'* bit is 1 (since we are dealing with si). The second and 
fourth term are part of our pre-computation. The first and third terms can be 
computed on the fly and stored in the table as we increase t from 1 to T. Thus, 
for each i and j we can compute this expression in constant time and hence we 
can produce a prediction in 0{T^) time per step. 

The second part of the theorem is proved in a similar manner by using only 



aligned intervals for splitting the sequence (Definition 11) and observing that 
the number of aligned intervals spanning a given position is at most O(logT). 
The algorithm achieves payoff at least Pa' because of Lemma [T] 
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A Experimental results 

In this section we describe our experimental setup and findings. 

The first part of the experiment is to experimentally estimate the value of ag. 
In general we may think of ao as a function of T . In Section^we saw that cko(T') 
is bounded from above by an absolute constant for all T. In Section [A. 1| below 
we estimate the values of ao for a range of T. 

The second part of the experiment is to implement our algorithm and com- 
pare its performance against 3 other prediction algorithms. This is described in 
Section IA.2I below. 
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A.l Computation of ap 



We denote by ao{T) the minimum value of a such that the payoff function 
Pa is feasible for sequences of length T. For a particular T, this value can be 
computed using Theorem [5] While Theorem [5] requires us to compute the payoff 
function over all sequences of length T (to compute the expectation), we can 
experimentally approximate this by taking sufficiently many random sequences 
of length T and looking at the expectation of the sample. We are interested in 
T = 389 which is the number of minutes in a trading day for which we have 
returns data (there are 390 minutes in a typical trading day and the returns for 
the first minute is undefined). 



Note that the standard error of the sample mean is obtained as the sample 
standard deviation divided by -i/n where n = 400 is the number of trials. The 
following chart shows the mean payoff and standard error for various values of 
a for T = 389. 




2.6 



From the figure we see that a — 1.96 is a good estimate for Q;o(r) for T = 389. 
The figure below shows estimated values of ao for various T. 
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2.10 
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A. 2 Comparison of predictive performance 

The algorithms we consider are:- 

1. The baseUne buy and hold strategy that achieves payoff equal to the height 
(height) 

2. The algorithm described in this paper (interval) 

3. Weighted Majority algorithm (WM) 

4. The algorithm of [T^] (Algorithm 4, section 5) (boundedloss) 

5. An algorithm based on Auto Regressive Integrated Moving Average (arima) 

Note that algorithms 2-4 are based on ideas from regret minimization with prov- 
able guarantees while the fifth is a commonly used model for predicting time se- 
ries data. To implement the fourth algorithm we use the function auto. arima () 
in R which is part of the library forecast. 

The prediction task we consider is to predict the next minute returns for a 
stock over a single trading day using only the previous 1 minute returns of the 
given stock for the given day. More precisely, we define the price of a stock at 
a given time taking the average of the best bid price and best ask price at that 
time as reported by the New York Stock Exchange (NYSE). We perform this 
prediction experiment over 189 days for the following 5 US stocks/ETFs from 
various sectors: MSFT, GE, GLD, QQQ and WMT. This gives us performance 
data for each algorithm for a total of 389 x 189 x 5 = 367, 605 data points. The 
results obtained are shown in the figure below. 
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■ GE 

■ GLD 
IQQQ 

■ WMT 



height 



We note that while our algorithm performs better in practice than other re- 
gret minimization based prediction algorithms with provable guarantees, it is 
outperformed by the ARIMA model. 

B Omitted Proofs 

For s a sequence of bits of length at most T, Let R{s) denote a random string of 
length T with prefix s; that is, append a random suffix to s to make it of length 
T. Let a.b denote the concatenation of a and b. Let f{D) denote the expected 
value of / on a string drawn from D. Let [T] = {1,2, . . . ,T}. 

Observation 16 Let A be an algorithm that guarantees a regret of at most 
c ■ ^/x on an interval of length x for all sequences. Then there is a distribution 
D over sequences of length kx such that the expected regret of A on D is at least 
fi{k ■ y/x). Setting k to be large enough, this implies that there is no prediction 
algorithm that can guarantee a regret of 0{^\Y\) on all intervals Y for all input 
sequences. 

Proof. Let 5*1 be the sequences of length x with absolute height more than 2c^/x 
and 5*2 be all other sequences of length x. We know that the expected payoff 
of A on a uniformly random sequence of length x is 0. On the other hand, the 
payoff of A on any sequence in S\ is at least c • \fx. A random string of length 
x falls into S\ with probability e"^'"^ -'. Thus, the expected payoff of A on a 
random string chosen from 52 is at most —C\/xe~^^'^ ' = —Q{\/x). 

Consider the distribution D over sequences of length kx which is just the 
concatenation of k random, independent sequences from 52- Then because A 
has bounded regret in every interval of length x, by the same argument as above 
we would get that the expected payoff of A on D is at most —Q{k ■ ^/x) and 
hence the expected regret is at least f2{k ■ \/x). 
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Lemma 2. If P^ is feasible then Pea is also feasible, where c := 'i^ . 

Proof: 

Let X\,Xi,...,Xk or a given sequence S. We split each interval Xi into 

a disjoint union of aligned intervals lii, . . . ,Ti;. We will then show that the 

identity 

I 



always holds where |/| denotes the length of the interval /. This suffices to prove 

the theorem since h(Xi) < J2j=i f^O^ij)- 

For notational simplicity, let I = Xi and a; = |/|. If / is an aligned interval we 
are done, otherwise we write it as the minimal union of aligned intervals (take 
out the largest aligned interval in / and repeat). There are three possibilities :- 

1. / = /i U /2 is a union of two intervals of size x/2 each (eg. the interval 

[r/4 + 1, 3r/4]) 

2. / = /i U /2 U . . . U // , where each Ij is of a different size. Note that all interval 
sizes on the right are powers of 2 and strictly less than x 

3. I = J U J' where each J can be written as a union of intervals as in[T]or[2] 
above 



In the first case, 

In the second case, 
I 



|/i| + V^ < 2 • v/^ - V2 • V^ 






In the third case. 



\J\ + ^\J'\< 



V2- 1 



' ^< ^ 



V2-1 V2-1 



B.l Generalization to values of bt beyond [—1, 1] 

In many applications the values bf may not be bounded in a range such as [—1, 1] 
but could have unbounded values, as in the case when they are drawn from a 
normal distribution. We will now extend our algorithm to such a case. We will 
show that our guarantees continue to hold in a semi adversarial setting where an 
adversary chooses the signs of bt but its magnitude is chosen from distribution 
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with mean 1. Let D denote a distribution over magnitude of real numbers with 
mean 1 (and clearly with non- negative support). Let s denote a sequence of 
bits (as signs) bt- Let M{s) denote a sequence of real numbers where each real 
number mt is obtained by multiplying bt with a randomly and independently 
drawn value from D. 

Let / denote a desired payoff function on a sequence of real numbers. We will 
show a sufficient condition to achieve on a sequence drawn from M{s) an ex- 
pected payoff of /(Af (s)) ~ Eg^M{s)[f{S)]- In the prediction algorithm, instead 
of appending random bits, we append a numbers with random signs but with 
magnitudes drawn from D. Given a sequence of real numbers m. Let C{m) de- 
note a random completion of to to a sequence of length T by appending numbers 
drawn randomly from D and with a randomly chosen sign (+1,-1). 

Theorem 17. Given a payoff function f defined on a sequence of real numbers, 
if Es[f{M{s))] < 0, then there is a prediction algorithm whose expected payoff 
on a string drawn from M{s) is at least f{M{s)). This is obtained by betting 
bt = {f{C{m.[+l])) — /(C(to,.[— l])))/2, where s is the sequence seen so far. 
Note that bt e [-1, 1] as long as for all s, |/(C(m.[+l])) - /(C(m.[-l]))| < 2 

Proof. Let s denote the sequence of signs seen so far. As in Covers proof we can 
show that setting bt = {f{M{R{s.[+l])))-f{M{R{s.[-l]))))/2 ensures that our 
expected payoff at time t is at least f{M{R{s))). 

And also note that (/(C(to.[-|-1])) — /(C(to.[— l])))/2 in expectation is equal 
to {f{M{R{s.[+l]))) - f{M{R{s.[-l]))))/2 as to is distributed as M{s). 



C Trade-off with two experts 

Equivalence between the bit-prediction and two experts problem. The following 
equivalence is shown in [24]. We redo the same proof here for the DP based 
solution. 

In the above formulation we can define loss to be the maximum (-ve) payoff, 
and we can obtain a tradeoff between regret R and loss L. This tradeoff is use- 
ful in obtaining a tradeoff on two different regrets when there are two experts. 
In each round each expert has a payoff in the range [0, 1] that is unknown to 
the algorithm. For two experts, let bn, &2t denote the payoffs of the two experts. 
The algorithm pulls the each arm (expert) with probability bn, 62* G [0; 1] respec- 
tively where bit + b2t = 1- The payoff of the algorithm is yl = J2t=i ^i«^it + ^2t&2t- 
Let Xi = J2t=i bit We will study the regret trade-off Ri , R2 with respect to these 
two experts which means that A > Xi — Ri and A > X2 — i?2- 

One question that has been asked before is a tradeoff between regret to the 
average and regret to the max |17ll2j . With two experts, the regret /loss tradeoff 
in the sequence prediction problem is related to regret trade-off for the two 
experts problem. Let R, L be feasible upper bounds on the regret and loss in 
the sequence prediction problem in the worst case; Let Ro, Lo be feasible upper 
bounds on the regret and loss with version of the sequence prediction problem 
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with one sided bets (that is bt cannot be negative; the feasible payoff curves for 
this case is a simple variant of i^ci.c2 where F' is capped to lie in [0, 1].) Let Ri, 
R2 be feasible upper bounds in regret with respect to expert one and expert two 
in the worst case. Let Rm, Ra be feasible upper bounds on the regret to the max 
and regret to the average with two experts in the worst case. 

Lemma 3 ([24]). 

Then R, L is feasible in the sequence prediction problem if and only if Rm — 
R/2, Ra = L/2 is feasible for regret to the max and regret to the average in the 
two experts setting. 

Ro,Lo is feasible in the sequence prediction problem (with one sided bets) if 
and only if Ri = Lo,R2 = Ro *s feasible for regret to the first expert and regret 
to the second expert in the two experts setting. 

Proof. First we look at reduction from the regret to the average and regret 
to the max problem. We can reduce this problem to our sequence prediction 
problem by producing at time t, bt = {bu — fo2t)/2. A bet bt in our prediction 
problem can be translated back probabilities bit = (1 + ^t)/2 and (1 — &t)/2 for 
the two experts. A payoff A in the original problem gets translated into payoff 
Y.t ^it(l + ^t)/2 + ^2t(l - bt)/2 = (Xi + X2)/2 + Am the two experts case. In 
this reduction the loss L gets mapped to Ra and the regret R gets mapped to 
Rm- However note that bt is now in the range [0, 1/2]. Therefore we need to scale 
it by 2 to reduce it to the standard version of the original problem. Conversely, 
given an sequence bt of the prediction problem we can convert it into two experts 
with payoffs bit = (1 + bt)/2,b2t = (1 — bt)/2. The average expert has payoff 
T/2. A payoff of A in prediction problem can be obtained from a sequence of 
arm pulling probabilities with payoff T/2 + A/2 by interpreting the arm pulling 

probabilities as (1 ± bt)/2 since J^t ^-^^-^ + ^-^^-^ = T/2 + A/2. 

Next we look at regrets i?i,i?2 with respect to the two experts. Given a se- 
quence of payoffs to for the two experts we can reduce it to a sequence for the 
(one sided ) prediction problem by setting bt = b2t — bit. A bet bt in the predic- 
tion problem can be translated to probabilities bu = I — bt and 62* — bt for the 
two experts. A payoff A in the prediction problem gets translated into payoff 
^((1 — bt)bit + btb2t — Xi + A in the two experts case where a zero regret in 
the prediction would correspond to ^4 = X2 — Xi . Thus a loss of L^ translates 
to a regret Ri — L^ with respect to the first arm. And regret Rq translates to 
regret R2 — Ro with respect to the second arm. Thus if Ro,Lo is feasible then 
so is Ri = Ro, R2 = Lo- Conversely, given an instance of the prediction problem 
with one sided bets, we can convert it to a version of the two armed problem 
by setting 62* = bt, bu — if bt > and 62* = 0, bu = —bt otherwise. A bet bt 
is used in our original problem if the arms are pulled with probabilities 1 — bt 
and bt respectively. The payoff in the experts problem is Xi + J2t ^t(^2t — bu)- 
So regrets i?i , R2 will translate to Lo = Ri, Ro = R2 in the prediction problem 
with one sided bets. 
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